36 research outputs found

    Image Retrieval : Modelling Keywords via Low-level Features

    Get PDF
    Advisors: Nicolas Tsapatsoulis. Date and location of PhD thesis defense: 29 April 2014, Cyprus University of TechnologyWith the advent of cheap digital recording and storage devices and the rapidly increasing popularity of online social networks that make extended use of visual information, like Facebook and Instagram, image retrieval regained great attention among the researchers in the areas of image indexing and retrieval. Image retrieval methods are mainly falling into content-based and text-based frameworks.Although content-based image retrieval has attracted large amount of research interest, the difficulties inquerying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images. Learned models of this correlation are then applied to predict keywords for unseen images [4]. In the literature of automatic semantic image annotation, proposed approaches tend to classify images using only abstract terms or using holistic image features for both abstract terms and object classes. The extraction and selection of low-level features, either holistic or from particular image areas is of primary importance for automatic image annotation. This is true either for the content-based or for the text-based retrieval paradigm. In the former case the use of appropriate low-level features leads to accurate and effective object class models used in object detection while in the latter case, the better the low- level features are, the easier the learning of keyword models is.The intent of the image classification is to categorize the content of the input image to one of several keyword classes. A proper image annotation may contain more than one keyword that is relevant to the image content, so a reclassification process is required in this case, as well as whenever a new keyword class is added to the classification scheme. The creation of separate visual models for all keyword classes adds a significant value in automatic image annotation since several keywords can be assigned to the input image. As the number of keyword classes increases the number of keywords assigned to the images also increases too and there is no need for reclassification. However, the keyword modeling incurred various issues such as the large amount of manual effort required in developing the training data, the differences in interpretation of image contents, and the inconsistency of the keyword assignments among different annotators.This thesis focuses on image retrieval using keywords under the perspective of machine learning. It covers different aspects of the current research in this area including low-level feature extraction, creation of training sets and development of machine learning methodologies. It also proposes the idea of addressing automatic image annotation by creating visual models, one for each available keyword, and presents several examples of the proposed idea by comparing different features and machine learning algorithms in creating visual models for keywords referring to the athletics domain.The idea of automatic image annotation through independent keyword visual models is divided into two main parts: the training and automatic image annotation. In the first part, visual models for all available keywords are created, using the one-against-all training paradigm, while in the second part, annotations are produced for a given image based on the output of these models, once they are fed with a feature vector extracted from the input image. An accurate manually annotated dataset containing pairs of images and annotations is prerequisite for a successful automatic image annotation. Since the manual annotations are likely to contain human judgment errors and subjectivity in interpreting the image, the current thesis investigates the factors that influence the creation of manually annotated image datasets [5]. It also proposes the idea of modeling the knowledge of several people by creating visual models using such training data, aiming to significantly improve the ultimate efficiency of image retrieval systems [6].Moreover, it proposes a new algorithm for the extraction of low level features. The Spatial Histogramof Keypoints (SHiK) [7], keeps the spatial information of localized keypoints, on an effort to overcome the limitations caused by the non-fixed and huge dimensionality of the SIFT feature vector when used in machine learning frameworks. SHiK partitions the image into a fixed number of ordered sub-regions based on the Hilbert space-Filling curve and counts the localized keypoints found inside each sub-region. The resulting spatial histogram is a compact and discriminative low-level feature vector that shows significantly improved performance on classification tasks

    Image Retrieval : Modelling Keywords via Low-level Features

    No full text
    Advisors: Nicolas Tsapatsoulis. Date and location of PhD thesis defense: 29 April 2014, Cyprus University of TechnologyWith the advent of cheap digital recording and storage devices and the rapidly increasing popularity of online social networks that make extended use of visual information, like Facebook and Instagram, image retrieval regained great attention among the researchers in the areas of image indexing and retrieval. Image retrieval methods are mainly falling into content-based and text-based frameworks.Although content-based image retrieval has attracted large amount of research interest, the difficulties inquerying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. Much effort has been invested on automatic image annotation methods [1], since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure [2].In automatic image annotation, a manually annotated set of data is used to train a system for the identification of joint or conditional probability of an annotation occurring together with a certain distribution of feature vectors corresponding to image content [3]. Different models and machine learning techniques were developed to learn the correlation between image features and textual words based on examples of annotated images. Learned models of this correlation are then applied to predict keywords for unseen images [4]. In the literature of automatic semantic image annotation, proposed approaches tend to classify images using only abstract terms or using holistic image features for both abstract terms and object classes. The extraction and selection of low-level features, either holistic or from particular image areas is of primary importance for automatic image annotation. This is true either for the content-based or for the text-based retrieval paradigm. In the former case the use of appropriate low-level features leads to accurate and effective object class models used in object detection while in the latter case, the better the low- level features are, the easier the learning of keyword models is.The intent of the image classification is to categorize the content of the input image to one of several keyword classes. A proper image annotation may contain more than one keyword that is relevant to the image content, so a reclassification process is required in this case, as well as whenever a new keyword class is added to the classification scheme. The creation of separate visual models for all keyword classes adds a significant value in automatic image annotation since several keywords can be assigned to the input image. As the number of keyword classes increases the number of keywords assigned to the images also increases too and there is no need for reclassification. However, the keyword modeling incurred various issues such as the large amount of manual effort required in developing the training data, the differences in interpretation of image contents, and the inconsistency of the keyword assignments among different annotators.This thesis focuses on image retrieval using keywords under the perspective of machine learning. It covers different aspects of the current research in this area including low-level feature extraction, creation of training sets and development of machine learning methodologies. It also proposes the idea of addressing automatic image annotation by creating visual models, one for each available keyword, and presents several examples of the proposed idea by comparing different features and machine learning algorithms in creating visual models for keywords referring to the athletics domain.The idea of automatic image annotation through independent keyword visual models is divided into two main parts: the training and automatic image annotation. In the first part, visual models for all available keywords are created, using the one-against-all training paradigm, while in the second part, annotations are produced for a given image based on the output of these models, once they are fed with a feature vector extracted from the input image. An accurate manually annotated dataset containing pairs of images and annotations is prerequisite for a successful automatic image annotation. Since the manual annotations are likely to contain human judgment errors and subjectivity in interpreting the image, the current thesis investigates the factors that influence the creation of manually annotated image datasets [5]. It also proposes the idea of modeling the knowledge of several people by creating visual models using such training data, aiming to significantly improve the ultimate efficiency of image retrieval systems [6].Moreover, it proposes a new algorithm for the extraction of low level features. The Spatial Histogramof Keypoints (SHiK) [7], keeps the spatial information of localized keypoints, on an effort to overcome the limitations caused by the non-fixed and huge dimensionality of the SIFT feature vector when used in machine learning frameworks. SHiK partitions the image into a fixed number of ordered sub-regions based on the Hilbert space-Filling curve and counts the localized keypoints found inside each sub-region. The resulting spatial histogram is a compact and discriminative low-level feature vector that shows significantly improved performance on classification tasks

    Image retrieval: modelling keywords via low-level features

    No full text
    Although Content Based Image Retrieval (CBIR) has attracted large amount of research interest, the difficulties in querying by an example propel ultimate users towards text queries. Searching by text queries yields more effective and accurate results that meet the needs of the users while at the same time preserves their familiarity with the way traditional search engines operate. However, text-based image retrieval requires images to be annotated i.e. they are related to text information. In recent years, much effort has been invested on automatic image annotation methods, since the manual assignment of keywords (which is necessary for text-based image retrieval) is a time consuming and labour intensive procedure. This thesis focuses on image retrieval under the perspective of machine learning and covers di®erent aspects in this area. It discusses and presents several studies referring to: (a) low-level feature extraction and selection for the task of automatic annotation of images, (b) training algorithms that can be utilized for keyword modeling based on visual content, and (c) the creation of appropriate and reliable training data, to be used with the training scheme, using the least manual effort. The main contribution is a new framework that can be used to address the key issues in automatic keyword extraction by creating separate visual models for all available keywords using the one-against-all paradigm to account for the scalability and multiple keyword assignment problems. The prospective reader of this thesis would be equipped with the ability to identify the key issues in automatic image annotation and would be triggered to think ahead to propose alternative solutions. Furthermore, this thesis can serve as a guide for researchers who want to experiment with automatic keyword assignment to digital images.Constantinos S. Pattichis (President) Andreas Lanitis (Member)Complete

    Crowdsourcing annotation: modelling keywords using low level features

    No full text
    Tagging large collections is often prohibitive and manual tags are known to be imprecise, ambiguous, inconsistent and subject to many variations. A possible way to alleviate these problems and improve the annotation quality is to obtain multiple annotations per image by assigning several annotators into the task. In the current work we present an approach to model the view of several annotators using four MPEG-7 descriptors and a well known data classifier. We apply keywords modelling to the annotation data collected in the framework of Commandaria project where sixteen non-expert users annotated a set of a hundred images using a predefined set of keywords. The images sharing a common keyword are grouped together and used for the creation of the visual model corresponds to this keyword. Finally, the created models used to classify the images into the keyword classes in terms of 2-classes combinations using the 10-fold cross-validation technique. The experimental results are examined under two perspectives: First, in terms of the separation ability of the various keyword classes and second, in terms of the efficiency of the four visual descriptors as far as the image classification task is concerne

    On the creation of visual models for keywords through crowdsourcing

    No full text
    Crowdsourcing annotation is a recent development since a complete and elaborate annotation of the content of an image is an extremely labour-intensive and time consuming task. In this paper we examine the possibility to build accurate visual models for keywords created through crowdsourcing. Specifically, 8 different keywords related to athletics domain have been modelled using MPEG-7 and Histogram of Oriented Gradients (HOG) low level features and the Sequential Minimal Optimization (SMO) classifier. The experimental results have been examined using accuracy metrics and are very promising showing the ability of the visual models to classify the images into the 8 classes with the highest average accuracy rate of 73.13% in the purpose of the HOG features

    Object classification using the MPEG-7 visual descriptors: an experimental evaluation using state of the art data classifiers

    No full text
    MPEG-7 visual descriptors include the color, texture and shape descriptor and were introduced, after a long period of evaluation, for efficient content-based image retrieval. A total of 22 different kind of features are included, nine for color, eight for texture and five for shape. Encoded values of these features vary significantly and their combination, as a means for better retrieval, is neither straightforward nor efficient. Despite their extensive usage MPEG-7 visual descriptors have never compared concerning their retrieval performance; thus the question which descriptor to use for a particular image retrieval scenario stills unanswered. In this paper we report the results of an extended experimental study on the efficiency of the various MPEG-7 visual features with the aid of the Weka tool and a variety of well-known data classifiers. Our data consist of 1952 images from the athletics domain, containing 7686 manually annotated objects corresponding to eight different classes. The results indicate that combination of selected MPEG-7 visual features may lead to increased retrieval performance compared to single descriptors but this is not a general fact. Furthermore, although the models created using alternative training schemes have similar performance libSVM is by far more effective in model creation in terms of training time and robustness to parameter variatio

    Visual Lifelogs Retrieval: State of the Art and Future Challenges

    No full text
    The use of wearable cameras covers several areas of application nowadays, where the need for developing smart applications providing the sustainability and well-being of citizens it is more necessary than ever before. The tremendous amount of lifelogging data to extract valuable knowledge about the every day life of the wearers requires state of the art retrieval techniques to efficiently store, access, search and retrieve useful information. Several works have been proposed combining computer vision and machine learning techniques to analyze the content of the data captured from visual wearable devices on a daily basis. This paper presents an overview of the progress in visual lifelogging retrieval and indicates the current advances and future challenges, highlighting the prospects of incorporating visual lifelogging retrieval in social computing applications

    Spatial histogram of keypoints

    No full text
    Among a variety of feature extraction approaches, special attention has been given to the SIFT algorithm which delivers good results for many applications. However, the non fixed and huge dimensionality of the extracted SIFT feature vector cause certain limitations when it is used in machine learning frameworks. In this paper, we introduce Spatial Histogram of Keypoints (SHiK), which keeps the spatial information of localized keypoints, on an effort to overcome this limitation. The proposed technique partitions the image into a fixed number of ordered sub-regions based on the Hilbert space- filling curve and counts the localized keypoints found inside each sub-region. The resulting spatial histogram is a compact and discriminative low-level feature vector that shows significantly improved performance on classification tasks. The proposed method achieves high accuracy on different datasets and performs significantly better on scene datasets compared to the Spatial Pyramid Matching method

    Image annotation: the effects of content, lexicon and annotation method

    No full text
    Image annotation is the process of assigning metadata to images, allowing effective retrieval by text-based search techniques. Despite the lots of efforts in automatic multimedia analysis, automatic semantic annotation of multimedia is still inefficient due to the problems in modeling high-level semantic terms. In this paper, we examine the factors affecting the quality of annotations collected through crowdsourcing platforms. An image dataset was manually annotated utilizing: (1) a vocabulary consists of preselected set of keywords, (2) an hierarchical vocabulary and (3) free keywords. The results show that the annotation quality is affected by the image content itself and the used lexicon. As we expected while annotation using the hierarchical vocabulary is more representative, the use of free keywords leads to increased invalid annotation. Finally, it is shown that images requiring annotations that are not directly related to their content (i.e., annotation using abstract concepts) lead to accrue annotator inconsistency revealing in that way the difficulty in annotating such kind of images is not limited to automatic annotation, but it is a generic problem of annotation
    corecore